Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added AutoGPTQ readme and test #1147

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

* Added AutoGPTQ UINT4 to README.md

* Weight only quantization
…dme, in AutoGPTQ (#305)

* Added SRAM_SLICER_SHARED_MME_INPUT_EXPANSION_ENABLED envar to the readme, in AutoGPTQ

* Update README.md with temp solution remark

* Update README.md
@libinta libinta added the synapse 1.17_dependency PR not backward compatible can be merged only when synapse 1.17 is available. label Jul 22, 2024

Llama2-7b in UINT4 weight only quantization is enabled using [AutoGPTQ Fork](https://github.com/HabanaAI/AutoGPTQ), which provides quantization capabilities in PyTorch.
Currently, the support is for UINT4 inference of pre-quantized models only.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the pre-quantized model mean? By which process to pre-quantized it and with which precision?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that we currently don't support the quantization process. but only support loading an existing quantized model

Copy link

@MrGeva MrGeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DO NOT merge it yet, need to get approval first

@vidyasiv
Copy link
Contributor

vidyasiv commented Aug 2, 2024

@HolyFalafel , is this PR ready for v1.17? Please resolve conflicts against latest main.

@vidyasiv
Copy link
Contributor

vidyasiv commented Aug 2, 2024

DO NOT merge it yet, need to get approval first

@MrGeva Can you clarify if this is targeted for v1.17?

@libinta libinta removed the synapse 1.17_dependency PR not backward compatible can be merged only when synapse 1.17 is available. label Aug 5, 2024
@emascarenhas
Copy link
Contributor

Please sync your PR with main/upstream and fix any merge conflicts. Thank you.

@yuanwu2017
Copy link
Contributor

@HolyFalafel @libinta
Will this patch be merged? At present, Gaudi has been integrated into AutoGPTQ. But I didn't see OH to install this package in this patch. I made a patch to integrate the AutoGPTQ in tgi-gaudi and submitted a patch for optimum. It works. Can you add AutoGPTQ package installation in this patch?

@emascarenhas
Copy link
Contributor

Can you rebase and revise your patch if necessary, so that it merges cleanly with main?

Also do "pip install -U ruff; make style" and check for any issues.

In addition, please run "tests/ci/fast_tests.sh" after installing
pip install -e ".[tests]"
and any slow tests, e.g.,
GAUDI2_CI=1 RUN_SLOW=1 python -m pytest
tests/test_text_generation_example.py -v -s -k
and report the results here.

@imangohari1
Copy link
Contributor

Can you rebase and revise your patch if necessary, so that it merges cleanly with main?

Also do "pip install -U ruff; make style" and check for any issues.

In addition, please run "tests/ci/fast_tests.sh" after installing pip install -e ".[tests]" and any slow tests, e.g., GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/test_text_generation_example.py -v -s -k and report the results here.

@HolyFalafel
Thanks for this draft PR.
what is the status of this, if this is still needed: Can you rebase, synch with top of main habana and follow the instructions to test it?
if not needed, please close.
thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants